Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Improve the unresolved-import check #13007

Merged
merged 3 commits into from
Aug 21, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

This PR improves the unresolved-import check so that it once again is able to recognise unresolved import from imports. It adds the same tests as #12986, but doesn't add the UnknownTypeKind enum that that PR introduces.

I've had to comment out some assertions in the tests for now. One case that passes with #12986 but doesn't work with this PR is this:

  • src/a.py: import foo as foo
  • src/b.py: from a import foo

This should only cause an unresolved-import diagnostic to be emitted on a.py, and this is the case with #12986, but with this PR unresolved-import diagnostics are emitted on both files.

Another case that passes with #12986 but not with this PR is the case where a symbol that exists, but has an inferred type of Unknown, is imported from another module. This should not trigger an unresolved-import diagnostic, but does with this PR, as it's hard to determine the root cause of the Unknown type.

Overall, this PR feels to me like it's an improvement on the status quo, but inferior to #12986. It's also a less significant change than #12986, however, since it's not a significant change to the design of how we think about types in red-knot, so the idea is that it should be possible to land this and then rebase #12986 on top of it.

Test Plan

Several tests added to crates/red_knot_python_semantic/src/types.rs and crates/red_knot_python_semantic/src/types/infer.rs. The assertions in the benchmarks have also been updated.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Aug 20, 2024
@AlexWaygood AlexWaygood force-pushed the alex/redknot-import-from branch from 7883c7a to 96dbfdd Compare August 20, 2024 13:16
Copy link
Contributor

github-actions bot commented Aug 20, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@AlexWaygood AlexWaygood force-pushed the alex/redknot-import-from branch from 96dbfdd to 36dca2a Compare August 20, 2024 13:34
@@ -977,11 +996,34 @@ impl<'db> TypeInferenceBuilder<'db> {
// the runtime error will occur immediately (rather than when the symbol is *used*,
// as would be the case for a symbol with type `Unbound`), so it's appropriate to
// think of the type of the imported symbol as `Unknown` rather than `Unbound`
let ty = module_ty
let member_ty = module_ty
.unwrap_or(Type::Unbound)
.member(self.db, &Name::new(&name.id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find it odd that we have to do the "unbound" check everywhere where we call .member. Do we not always want to emit a diagnostic when failing to resolve a member?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but it will need to be different error codes. It would be very counterintuitive for users if import foo triggers an unresolved-import error code and from foo import bar triggers an unresolved-member error code. Users will expect all import-related failures to be the same error code and all attribute-access-related failures to be a different error code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. We might just want to parameterize the member method. But maybe that's not worth it? I don't know. Have you looked at how mypy/pyright do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might just want to parameterize the member method.

Not sure exactly what you mean; could you clarify?

Have you looked at how mypy/pyright do this?

Mypy has a very different architecture to the one we're building. It resolves all imports eagerly in a first pass before it does any other type checking; module-not-found errors are emitted during this first pass here: https://github.com/python/mypy/blob/fe15ee69b9225f808f8ed735671b73c31ae1bed8/mypy/build.py#L2600-L2700.

I'm not as familiar with pyright's codebase but I can dig in now

Copy link
Member Author

@AlexWaygood AlexWaygood Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... it seems both mypy and pyright in fact do what you'd prefer here -- they both use "attribute-access" error codes for from x import y imports:

Pyright's error code is reportAttributeAccessIssue; mypy's is attr-defined, which both seem equally bad. In terms of error messages, pyright definitely wins, though: pyright has:

"foo" is unknown import symbol

whereas mypy has

Module "collections.abc" has no attribute "foo" 

I'm somewhat surprised by this. But given this precedent, I'm okay with emitting the diagnostic from .member(), if you'd prefer. Though I hope we still aspire to be more user-friendly in our error messages than either mypy or pyright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I agree that that makes a lot of sense! I can make that change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... if the inferred type of a member access is a union, e.g. int | Unbound, would you expect that to be represented as an Ok() or an Err() variant (if we return an Result from .member()?

Copy link
Member Author

@AlexWaygood AlexWaygood Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a closure or similar to create a diagnostic if the member is unbound.

This might be doable, but after looking at it for a little bit I think it would still be quite a large refactor and design change. I think it's out of the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced there is any problem with reporting an error on from foo import bar (where the module foo does exist, but has no top-level symbol bar) as an attribute-access error code. If I had to pick between the pyright or mypy error messages above, I would prefer the mypy one (the pyright one gives less useful context). The fact that the attribute access occurred as part of a from-import will be clear from the error location; it doesn't have to be part of the error message or code. I wouldn't want to make design decisions based on the hypothesis that this is confusing to users without clear evidence that it is (e.g. a history of users filing issues on mypy about that error code or message); personally I find it intuitive.

Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return an Option or Result from member(), for the reason @AlexWaygood mentions above: we can have a possibly-unbound name (currently represented as a union with Unbound), and it's not clear how that should look in a binary representation.

We could design our own representation for "Type plus definitely-bound/maybe-unbound/definitely-unbound state" and use that instead of Type::Unbound. This will complicate some APIs and I think be somewhat less efficient, but it would have the advantage that it would force handling Unbound early, which I think is desirable. In general I don't think we want Unbound escaping from the scope where it originated; it should instead result in a diagnostic and be eliminated from the type (replaced with Unknown if we have no other type information; probably just disappear from the type if we do.) I do think this alternative to Type::Unbound is worth exploring and seeing what kind of impact it will have.

I was thinking that we would push diagnostics inside member() to simplify the handling of Unbound, and I think I still favor that approach. If we do want to make distinctions in error codes/messages based on context from the caller of member(), I think we can always pass in context to make that possible.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still have preferred to implement a holistic solution to how we handle unresolved members rather than implementing a one off solution because I'm worried that we (I for example) end up copying this pattern. That was also the reason why I intentionally didn't implement this diagnostic in my original PR.

I'm okay with merging if you find the parity important but I ask you to at least add a TODO comment to make it clear that this pattern should not be copied.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

I would still have preferred to implement a holistic solution to how we handle unresolved members

Well, so would I! #12986 was my attempt 😆 The idea behind this PR was to try to implement as many of the fixes implemented there as possible without fundamentally reworking red-knot's design, so that it would be clearer exactly what the pros and cons of doing so would be.

I ask you to at least add a TODO comment to make it clear that this pattern should not be copied.

I'll add this

AlexWaygood and others added 2 commits August 21, 2024 13:46
Update crates/red_knot_python_semantic/src/types/infer.rs

Co-authored-by: Micha Reiser <micha@reiser.io>
@AlexWaygood AlexWaygood force-pushed the alex/redknot-import-from branch from 7e907f8 to ea3a449 Compare August 21, 2024 13:09
Copy link

codspeed-hq bot commented Aug 21, 2024

CodSpeed Performance Report

Merging #13007 will improve performances by 4.16%

Comparing alex/redknot-import-from (2c1fb1d) with main (785c399)

Summary

⚡ 1 improvements
✅ 31 untouched benchmarks

Benchmarks breakdown

Benchmark main alex/redknot-import-from Change
linter/default-rules[unicode/pypinyin.py] 354.7 µs 340.6 µs +4.16%

@AlexWaygood AlexWaygood force-pushed the alex/redknot-import-from branch from ea3a449 to fce1b32 Compare August 21, 2024 13:16
@AlexWaygood AlexWaygood force-pushed the alex/redknot-import-from branch from fce1b32 to 2c1fb1d Compare August 21, 2024 13:37
@AlexWaygood AlexWaygood enabled auto-merge (squash) August 21, 2024 13:39
@AlexWaygood AlexWaygood merged commit ecd9e6a into main Aug 21, 2024
17 checks passed
@AlexWaygood AlexWaygood deleted the alex/redknot-import-from branch August 21, 2024 13:44
let b_file_diagnostics = super::check_types(&db, b_file);
assert_diagnostic_messages(
&b_file_diagnostics,
&["Could not resolve import of 'thing' from 'a'"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think describing it as a failed attribute access would actually be much clearer than the vague "Could not resolve" which doesn't really communicate anything! This message also doesn't currently clarify that 'a' is a module (though it could be taken as implied.) I actually can't think of an error message here that I think would be better than "Module a has no attribute thing."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people really think of from collections import deque as an "attribute access" of deque from the collections module? I certainly don't. Of course that's what's happening under the hood, but even at runtime this detail is hidden -- the import fails with ImportError rather than AttributeError, and there's no mention of attempting to access an attribute:

% python -c "from collections import foo"                
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: cannot import name 'foo' from 'collections' (/Users/alexw/.pyenv/versions/3.12.4/lib/python3.12/collections/__init__.py)

Comment on lines +460 to +461
A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
despite the symbol existing in the symbol table for `a.py`"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we issue the error on the import in a.py, we should set the type of foo in a to Unknown, not Unbound, which should avoid this?

&self,
tail: Option<&str>,
level: NonZeroU32,
) -> Result<ModuleName, ModuleResolutionError> {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just emit the diagnostics inside this function, return an Option, and avoid the need for ModuleResolutionError?

module.unwrap_or_default()
),
);
} else if module_ty.is_ok() && member_ty.is_unknown() {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be emitting a diagnostic at all if the imported symbol is of type Unknown. Unknown might result from an error in the other module, but in that case the diagnostic should occur in the other module; importing a value of unknown type is not an error.

I think just removing this clause would fix the ignored test; what would it break?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only reason we have to check for Unknown here is that we intentionally replaced Unbound with Unknown just a few lines above. If we wait to do it after, the bug in the ignored test goes away, and all tests pass:

diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 173c957d1..c4ff40220 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -451,9 +451,6 @@ mod tests {
         );
     }

-    #[ignore = "\
-A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
-despite the symbol existing in the symbol table for `a.py`"]
     #[test]
     fn resolved_import_of_symbol_from_unresolved_import() {
         let mut db = setup_db();
diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs
index 9b183727e..28dc8f2e3 100644
--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -1012,15 +1012,9 @@ impl<'db> TypeInferenceBuilder<'db> {
             asname: _,
         } = alias;

-        // If a symbol is unbound in the module the symbol was originally defined in,
-        // when we're trying to import the symbol from that module into "our" module,
-        // the runtime error will occur immediately (rather than when the symbol is *used*,
-        // as would be the case for a symbol with type `Unbound`), so it's appropriate to
-        // think of the type of the imported symbol as `Unknown` rather than `Unbound`
         let member_ty = module_ty
             .unwrap_or(Type::Unbound)
-            .member(self.db, &Name::new(&name.id))
-            .replace_unbound_with(self.db, Type::Unknown);
+            .member(self.db, &Name::new(&name.id));

         if matches!(module_ty, Err(ModuleResolutionError::UnresolvedModule)) {
             self.add_diagnostic(
@@ -1032,7 +1026,7 @@ impl<'db> TypeInferenceBuilder<'db> {
                     module.unwrap_or_default()
                 ),
             );
-        } else if module_ty.is_ok() && member_ty.is_unknown() {
+        } else if module_ty.is_ok() && member_ty.is_unbound() {
             self.add_diagnostic(
                 AnyNodeRef::Alias(alias),
                 "unresolved-import",
@@ -1044,6 +1038,8 @@ impl<'db> TypeInferenceBuilder<'db> {
             );
         }

+        let member_ty = member_ty.replace_unbound_with(self.db, Type::Unknown);
+
         self.types.definitions.insert(definition, member_ty);
     }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you're right that this fixes the tests I added (and skipped) in this PR. Here's an interesting edge case that you might not have considered, though. What should we do for something like this?

# foo.py
for x in range(0):
    pass

# bar.py
from foo import x

Running python bar.py fails with ImportError at runtime because x is not defined in foo.py (due to the iterable being empty), but neither mypy nor pyright catches this. Arguably we should infer the type of x after the for loop has completed as being int | Unbound -- should we flag any union that includes Unbound as one of the elements with an unresolved-import diagnostic? Probably not, I don't think -- the import might succeed, it just also might not. So let's say (for argument's sake) we use the error code import-possibly-unbound for that.

So then what if we have something like this, where the symbol definitely exists in foo.py, but is also definitely unbound? (Mypy and pyright do both catch this one.) Should that have the error code unresolved-import, or import-possibly-unbound?

# foo.py
x: int

# bar.py
from foo import x

These are somewhat obscure edge cases, so I think what you propose here is a strict improvement on what I implemented in this PR; I'll make a fixup PR to address your suggestions. But my point is that we may still at some point need some context propagated regarding the cause of the error -- whether that error is represented as Unknown or Unbound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great points and great questions! I had the x: int case in mind last night when thinking again about representation of unbound-ness in a different comment on this PR.

I agree that maybe-unbound is a different error class than definitely-unbound (and arguable whether maybe-unbound should trigger an error on import at all.)

I think we do want a representation of "definitely unbound and typed" that we use for imports, so e.g. we can error if you import from x: int (as long as it's not a stub), but we can still type check the importing module treating x as int rather than Unknown. (Not sure how important this is, but it seems marginally better.) I think this can be handled by declared vs inferred types.

What's less clear to me is if we need a representation of "definitely unbound and typed" for inferred types. It would mean that within a scope we could also check code following x: int and emit diagnostics both for "x is not bound" and for use of x that isn't valid for an int. Maybe that's useful?

Like I mentioned above in a different comment, I'm pretty open to exploring other representations of unbound to make it orthogonal to types; the main question for me is if we can avoid it regressing in perf, and if not, do we gain enough from it to be worth the regression?

}
&self,
module_name: ModuleName,
) -> Result<Type<'db>, ModuleResolutionError> {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the advantage of returning a Result vs just emitting the diagnostic directly in this method, as the previous code did.

@carljm
Copy link
Contributor

carljm commented Aug 22, 2024

While I'm making notes on this PR of things that I'd like to address in this area: we've now started to accumulate some tests in types.rs that (as far as I can see) are indistinguishable in purpose and kind from the tests we already have in infer.rs. Maybe all of the tests should be moved out to types.rs instead of infer.rs, or should actually live in their own file (or better should be in a more concise testing framework), but I don't think we should be accumulating these tests in two different files without a clear rationale for which tests go in which file.

@AlexWaygood
Copy link
Member Author

While I'm making notes on this PR of things that I'd like to address in this area: we've now started to accumulate some tests in types.rs that (as far as I can see) are indistinguishable in purpose and kind from the tests we already have in infer.rs. Maybe all of the tests should be moved out to types.rs instead of infer.rs, or should actually live in their own file (or better should be in a more concise testing framework), but I don't think we should be accumulating these tests in two different files without a clear rationale for which tests go in which file.

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

@MichaReiser
Copy link
Member

MichaReiser commented Aug 22, 2024

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

My initial idea was to only test that typecheck emits diagnostic and I picked the only existing diagnostic. My intention isn't to test specific diagnostics.

@carljm
Copy link
Contributor

carljm commented Aug 22, 2024

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

Ah! Yes, I see that distinction does seem to apply currently. I think longer-term it doesn't necessarily make sense to split tests according to this distinction, though. At some point I suspect ~all of our tests will be tests of diagnostics, which can include a diagnostic from reveal_type() as one edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants